“AI has been around forever.”

  • IKR is my natural reaction because I'm afraid of looking… dumb.

  • I mean, sure, but what do they mean?

  • What should I know so that I can sound like a well-read technocrat when I'm making small or medium talk before dinner or with an acquaintance?


1950s-2010s: The Old AI

During what was effectively an invite-only math and science summer camp in 1956, a group of campers researchers got together and basically said "let's try to make machines that think like humans." 

They started with things like playing chess or proving math theorems. So, for the thinking part, they programmed explicit rules: if the chess piece is here, then move there.

Using a different example, imagine creating the rules for identifying a dog, for instance, would be: dogs have four legs, fur, and bark.

This approach worked for some tasks but was pretty rigid–what about a three-legged dog? A shaved poodle? A dog lying down? 

The rules broke constantly because the world is an unruly and unpredictable place, TBH.

So, we could (and did!) spend years programming rules for one specific problem, and the system couldn't do anything else (honestly, I’d bet some of you are thinking that’s maybe not all that different from how a lot of people perceive their college degree 👀).

2012-2017: Alex(Net) and friends enter the chat

In 2012 though, we went from rules to patterns.

A team at the University of Toronto (blame Canada) used a deep neural network (Alex, AlexNet) for an image identification competition and crushed their competition, showing neural networks could learn complex patterns from data better than humans writing rules.

If you went HUH? when I said deep neural network, the "neural network" part is just describing how this pattern-matching happens.

Back to the dog example because, obviously. 

Instead of using a rule book to identify dogs ("dogs have four legs, fur, and bark"), we can show systems (or children, anyone!) thousands of pictures of dogs. Eventually, it finds patterns—certain shapes, textures, proportions—and starts recognizing dogs they've never seen before, even weird-looking ones.

Why didn’t we do this before 2012? 

  • First, hardware. 

  • Graphics cards (GPUs), originally built for video games, turned out to be perfect for training neural networks, even though that has nothing to do with graphics.

  • Second, data. 

  • The internet gave us endless images and words. Neural networks are data-hungry—they Pac-Man through massive amounts of examples to find patterns. In the 1980s and 90s, researchers didn't have that.

  • Third, math.

    • Beautiful Minds made it possible to train really deep networks (many layers) without them… breaking. Before this, neural networks would kind of lose the thread when things got too complex.

Alex (AlexNet) wrangled pattern identification for images, but we weren't quite there yet when it came to text. 

Said another way, we were working with picture books instead of actual books at this point.

With text, we struggled with memory - meaning, what did you just say? What did you just say before that? In the sentence before that?

This “Old AI” had problems with its memory (what, it’s old!)

Take, for example:

  • "Sarah didn't think the date would be fun. But she got dressed and went anyway. When it started, she realized she was right."

  • "Right" about what? That it wouldn't be fun—back from the first sentence. You need to remember that negative assumption, carry it through the middle sentence about getting dressed (a lot to manage, I know), and connect it to "right" at the end. That's three steps of memory.

Older AI systems processed text more like someone reading one word at a time through a tiny window. It didn't remember what you’d said before. 

We by and large lost the plot beyond anything other than autocomplete on your phone. Predicting the next word or two based on what you just typed. "I'm going to the..." suggests "store," "park," "gym." This worked fine because it only needs to look back a few words.

You say charcuterie, I say board (jk).

Anything under about 2-3 sentences worked okay. Anything longer would just become word salad.

2017-now: But wait, there’s more! The New AI

Large language models walk into the bar.

They use text (!!) instead of pictures.

Until now, LLMs had stayed home.

What ‘suddenly’ made these neural networks particularly useful, valuable, scalable, [insert choice word], was that a team of Googlers realized AI needed working memory to keep track of context over multiple sentences and long paragraphs.

Transformers (no, not that kind) are what give LLMs the ability to pay attention to more than just the preceding few words.

The same way we have thought loops about whether we turned the oven off or sent a weird work email, with text, neural networks refer to common loops connecting one thought/phrase/sassy comment to another. 

We (the royal We, to be sure) showed LLMs text—books, The Daily Mail, your emo Facebook status updates from college, everything— without explicit instructions and they sought patterns from millions of examples of "when someone writes this, what tends to come next?"

Some of us think this is magic, while others... don't.

But that's a topic for another day.

Next
Next

There are truly no stupid questions (with AI).